Clustering of football players based on performance data and aggregated clustering validity indexes
نویسندگان
چکیده
Abstract We analyse football (soccer) player performance data with mixed type variables from the 2014-15 season of eight European major leagues. cluster these based on a tailor-made dissimilarity measure. In order to decide between many available clustering methods and choose an appropriate number clusters, we use approach by Akhanli Hennig (2020. “Comparing Clusterings Numbers Clusters Aggregation Calibrated Clustering Validity Indexes.” Statistics Computing 30 (5): 1523–44). This is several validation criteria that refer different desirable characteristics clustering. These are chosen aim clustering, this allows define suitable index as weighted average calibrated individual indexes measuring features. derive two clusterings. The first one partition set into groups essentially players, which can be used for analysis team’s composition. second divides small clusters (with 10 players average), finding very similar profile given player. It discussed in depth what Weighting informed survey experts.
منابع مشابه
the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملEstimating Clustering Indexes in Data Streams
We present random sampling algorithms that with probability at least 1 − δ compute a (1 ± )approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges. The space used by our algorithm to estimate the clustering coefficient is inversely related to the clustering coefficient of the network itself. The space used by o...
متن کاملA Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کاملModified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers
Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering in which there is no need to be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Quantitative Analysis in Sports
سال: 2023
ISSN: ['2194-6388', '1559-0410']
DOI: https://doi.org/10.1515/jqas-2022-0037